AITopics | open-vocabulary semantic segmentation

Collaborating Authors

open-vocabulary semantic segmentation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

2093ed77c549eda95bd6f7212b735b43-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 19:59:19 GMT

large language model, machine learning, segmentation, (19 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Media > Film (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
(2 more...)

Add feedback

TowardsOpen-VocabularySemanticSegmentation WithoutSemanticLabels

Neural Information Processing SystemsFeb-8-2026, 02:25:15 GMT

Recently, several studies [11, 12, 7, 8] have pioneered open-vocabulary semantic segmentation without densely-annotated semantic labels.

machine learning, natural language, segmentation, (17 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Artificial Intelligence > Vision (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

AttrSeg: Open-Vocabulary Semantic Segmentation via Attribute Decomposition-Aggregation

Neural Information Processing SystemsDec-24-2025, 04:51:28 GMT

Open-vocabulary semantic segmentation is a challenging task that requires segmenting novel object categories at inference time. Recent works explore vision-language pre-training to handle this task, but suffer from unrealistic assumptions in practical scenarios, i.e., low-quality textual category names.For example, this paradigm assumes that new textual categories will be accurately and completely provided, and exist in lexicons during pre-training.However, exceptions often happen when meet with ambiguity for brief or incomplete names, new words that are not present in the pre-trained lexicons, and difficult-to-describe categories for users.To address these issues, this work proposes a novel framework, AttrSeg

attribute decomposition-aggregation, category, open-vocabulary semantic segmentation, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.61)

Add feedback

Structure-Aware Feature Rectification with Region Adjacency Graphs for Training-Free Open-Vocabulary Semantic Segmentation

Huang, Qiming, Ai, Hao, Jiao, Jianbo

arXiv.org Artificial IntelligenceDec-9-2025

Benefiting from the inductive biases learned from large-scale datasets, open-vocabulary semantic segmentation (OVSS) leverages the power of vision-language models, such as CLIP, to achieve remarkable progress without requiring task-specific training. However, due to CLIP's pre-training nature on image-text pairs, it tends to focus on global semantic alignment, resulting in suboptimal performance when associating fine-grained visual regions with text. This leads to noisy and inconsistent predictions, particularly in local areas. We attribute this to a dispersed bias stemming from its contrastive training paradigm, which is difficult to alleviate using CLIP features alone. To address this, we propose a structure-aware feature rectification approach that incorporates instance-specific priors derived directly from the image. Specifically, we construct a region adjacency graph (RAG) based on low-level features (e.g., colour and texture) to capture local structural relationships and use it to refine CLIP features by enhancing local discrimination. Extensive experiments show that our method effectively suppresses segmentation noise, improves region-level consistency, and achieves strong performance on multiple open-vocabulary segmentation benchmarks.

artificial intelligence, machine learning, segmentation, (13 more...)

arXiv.org Artificial Intelligence

2512.0736

Genre: Research Report (0.82)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

NERVE: Neighbourhood & Entropy-guided Random-walk for training free open-Vocabulary sEgmentation

Mahatha, Kunal, Dolz, Jose, Desrosiers, Christian

arXiv.org Artificial IntelligenceNov-12-2025

Despite recent advances in Open-V ocabulary Semantic Segmentation (OVSS), existing training-free methods face several limitations: use of computationally expensive affinity refinement strategies, ineffective fusion of transformer attention maps due to equal weighting or reliance on fixed-size Gaussian kernels to reinforce local spatial smoothness, enforcing isotropic neighborhoods. W e propose a strong baseline for training-free OVSS termed as NERVE (Neighbourhood & Entropy-guided Random-walk for open-V ocabulary sEgmentation), which uniquely integrates global and fine-grained local information, exploiting the neighbourhood structure from the self-attention layer of a stable diffusion model. W e also introduce a stochastic random walk for refining the affinity rather than relying on fixed-size Gaussian kernels for local context. This spatial diffusion process encourages propagation across connected and semantically related areas, enabling it to effectively delineate objects with arbitrary shapes. Whereas most existing approaches treat self-attention maps from different transformer heads or layers equally, our method uses entropy-based uncertainty to select the most relevant maps. Notably, our method does not require any conventional post-processing techniques like Conditional Random Fields (CRF) or Pixel-Adaptive Mask Refinement (P AMR). Experiments are performed on 7 popular semantic segmentation benchmarks, yielding an overall state-of-the-art zero-shot segmentation performance, providing an effective approach to open-vocabulary semantic segmentation.

machine learning, natural language, segmentation, (15 more...)

arXiv.org Artificial Intelligence

2511.08248

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

RadZero: Similarity-Based Cross-Attention for Explainable Vision-Language Alignment in Chest X-ray with Zero-Shot Multi-Task Capability

Park, Jonggwon, Yoon, Byungmu, Kim, Soobum, Choi, Kyoyun

arXiv.org Artificial IntelligenceNov-7-2025

Recent advancements in multimodal models have significantly improved vision-language (VL) alignment in radiology. However, existing approaches struggle to effectively utilize complex radiology reports for learning and offer limited interpretability through attention probability visualizations. To address these challenges, we introduce $\textbf{RadZero}$, a novel framework for VL alignment in chest X-ray with zero-shot multi-task capability. A key component of our approach is $\textbf{VL-CABS}$ ($\textbf{V}$ision-$\textbf{L}$anguage $\textbf{C}$ross-$\textbf{A}$ttention $\textbf{B}$ased on $\textbf{S}$imilarity), which aligns text embeddings with local image features for interpretable, fine-grained VL reasoning. RadZero leverages large language models to extract concise semantic sentences from radiology reports and employs multi-positive contrastive training to effectively capture relationships between images and multiple relevant textual descriptions. It uses a pre-trained vision encoder with additional trainable Transformer layers, allowing efficient high-resolution image processing. By computing similarity between text embeddings and local image patch features, VL-CABS enables zero-shot inference with similarity probability for classification, and pixel-level VL similarity maps for grounding and segmentation. Experimental results on public chest radiograph benchmarks show that RadZero outperforms state-of-the-art methods in zero-shot classification, grounding, and segmentation. Furthermore, VL similarity map analysis highlights the potential of VL-CABS for improving explainability in VL alignment. Additionally, qualitative evaluation demonstrates RadZero's capability for open-vocabulary semantic segmentation, further validating its effectiveness in medical imaging. Code is available at $\href{https://github.com/deepnoid-ai/RadZero}{https://github.com/deepnoid-ai/RadZero}$.

large language model, machine learning, radzero, (20 more...)

arXiv.org Artificial Intelligence

2504.07416

Country:

North America > United States (0.14)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
Europe > Switzerland (0.04)
(3 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels Heeseong Shin 1 Chaehyun Kim

Neural Information Processing SystemsOct-9-2025, 18:49:52 GMT

Large-scale vision-language models like CLIP have demonstrated impressive open-vocabulary capabilities for image-level tasks, excelling in recognizing what objects are present.

dataset, open-vocabulary semantic segmentation, semantic segmentation, (11 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Israel (0.04)
Europe > Poland (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
(2 more...)

Add feedback

2093ed77c549eda95bd6f7212b735b43-Paper-Conference.pdf

Neural Information Processing SystemsOct-8-2025, 06:32:08 GMT

large language model, machine learning, segmentation, (19 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Media > Film (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
(2 more...)

Add feedback

Relationship Prompt Learning is Enough for Open-Vocabulary Semantic Segmentation

Neural Information Processing SystemsMay-27-2025, 07:42:31 GMT

Open-vocabulary semantic segmentation (OVSS) aims to segment unseen classes without corresponding labels. Existing Vision-Language Model (VLM)-based methods leverage VLM's rich knowledge to enhance additional explicit segmentation-specific networks, yielding competitive results, but at the cost of extensive training cost. To reduce the cost, we attempt to enable VLM to directly produce the segmentation results without any segmentation-specific networks. Prompt learning offers a direct and parameter-efficient approach, yet it falls short in guiding VLM for pixel-level visual classification.

open-vocabulary semantic segmentation, relationship prompt learning, segmentation-specific network, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (0.64)

Add feedback

Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels

Neural Information Processing SystemsMay-26-2025, 16:32:31 GMT

Large-scale vision-language models like CLIP have demonstrated impressive open-vocabulary capabilities for image-level tasks, excelling in recognizing what objects are present. However, they struggle with pixel-level recognition tasks like semantic segmentation, which require understanding where the objects are located. In this work, we propose a novel method, PixelCLIP, to adapt the CLIP image encoder for pixel-level understanding by guiding the model on where, which is achieved using unlabeled images and masks generated from vision foundation models such as SAM and DINO. To address the challenges of leveraging masks without semantic labels, we devise an online clustering algorithm using learnable class names to acquire general semantic concepts. PixelCLIP shows significant performance improvements over CLIP and competitive results compared to caption-supervised methods in open-vocabulary semantic segmentation.

machine learning, natural language, open-vocabulary semantic segmentation, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.66)
Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback